1987 


N8 8- 156^ 3r 

7JU 7 V 
1 * 1 ?. 


NASA/ASEE SUMMER FACUTLY RESEARCH FELLOWSHIP PROGRAM 

MARSHALL SPACE FLIGHT CENTER 
THE UNIVERSITY OF ALABAMA IN HUNTSVILLE 


PATTERN RECOGNITION TECHNIQUES 
FOR 

FAILURE TREND DETECTION 
IN 

SSME GROUND TESTS 


Prepared By: 
Academic Rank: 

\ 

I University and 

i 

NAS A/MS FC: 

i 

Laboratory : 

I 

Division: 
Branch : 

NASA Colleague: 
Date: 

Contract No : 

t 


A . Choudry 
Professor 

Department: University of 

Alabama in Huntsville 
Applied Optics 


Structures and 
Dynamics 

Control Systems 

Mechanical Systems 
Control 

Harry A. Cikanek 
August 31, 1987 
NGT-0 1-008-021 


I 


XIV 

I 


I 



Abstract 


The Space Shuttle Main Engine (SSME) is a very complex power plant and 
plays a crucial role in Shuttle missions. To evaluate SSME performanace 
1200 hot-fire ground tests have been conducted, varying in duration from 
0 to 500 secs. During the test about 500 sensors are sampled every 20ms 
to measure the various parameters. The sensors are generally bounded by 
'red-lines' so that an excursion beyond the red-line could lead to 
premature shutdown by the operator. In 27 tests, guided by the red-lines, 
it was not possible to effect an orderly premature shutdown. These tests 
became major incidents where serious damage to the SSME and the test 
stand resulted. In this study we have investigated the application of 
pattern recognition techniques to detect SSME performance trends that 
lead to major incidents. Based on the sensor data a set of (n) features is 
defined. At any time, during the test, the state of the SSME is given by a 
point in the n-dimensional feature-space. The entire history of a given 
test can now be represented as a trajectory in the n-dimensional feature 
space. Portions of the 'normal' trajectories and the failed test 
trajectories would lie in different regions of the n-dimensional feature 
space. The feature space can now be partitioned into regions of 
normal-tests and failed tests. In this manner it is possible to examine the 
trajectory of a test in progress and predict if it *is heading into the 
'normal-region' or the 'failure-region' of the n-dimensional feature space. 
In this study we have developed techniques to extract features from 
ground test data, as supplied by Rocketdyne, and develop feature space 
trajectories for the tests. The initial results as presented here, look very 
promising. 
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Introduction 


ORIGINAL PAGE IS 
OF POOR QUALITY 


The Space Shuttle Main Engine (SSME) based on Hydrogen-Oxygen 
combustion is a very complex power plant employing numerous pumps, 
valves and ducts as shown in Fig.1. During a ground test about 500 sensors 
are used to monitor the state of SSME. Some of these sensors are used for 
the close loop control of SSME and are connected to a Computer System 
'Engine Controller' as shown in Fig.2. 
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Fig. 1 SSME ProDellant Flow Schematic 
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Fig. 2 Engine Controller & Hot Gas Manifold 
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There are 3 different data acquisition systems used to collect the sensor 
data (1,2), namely, 

-- Command and Data Simulator (CADS) 

-- Facility Recording (FR), and 

-- Analog High Frequency Recording (AHFR) 

In Fig. 3, the salient points of these systems is shown. The engine 
controller uses 16 bit computations on 12-bit data words to perform close 
loop operation of the SSME. For the SSME Anomaly and Failure Detection 
(SAFD) analysis, as reported in (2), the CADS and FR data provide the bulk 
of the input. 

In all about 1200 hot fire tests have been conducted on the SSME. In 27 
tests the SSME went out of control and serious damage to the engine and 
the teststand resulted. A summary of some of the salient points of the 
ground tests is given in Table 1. 

Considering that the replacement cost of an engine is ~$50M, it is highly 
desirable to develop some technique for detecting failure trends which 
would allow an orderly shutdown of the SSME and thereby preventing a 
major incident (3). In (2) and (3) various techniques for failure detection 
have been suggested including the following, 

-- Generalized Likelihood Ratio (GLR) 

-- Generalized Likelihood Test (GLT) 

— Voting 

~ Confidence Region Tests 

— Kalman Filters 

— Parameter Estimation 
-- Jump Processes 

— Pattern Recognition. 

The success of a technique will be determined by; 

-- detecting the fault fast enough to allow an orderly shutdown 

— identifying the technical nature of the fault. 
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Fig. 3 SSME Data Acquisition Sysfrn 
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TABLE 1. GROUND TEST SUMMARY 


- -1200 HOT- FIRES 

- -27 MAJOR INCIDENTS 

- -TEST DURATION 0-500 SEC. 

- -300-500 SENSORS MONITORED 

- -SAMPLING RATE 50 Hz. 

- -DATA WORD 12 bits 

- -DATA TRANSFER RATE 0.5-1 Mhz 

- -DATA VOLUME 0.1 - 1 Gbits 
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. . w.w. wiuoiiuii octi i uwt> i ue aescriDea by the 'weighted truth-table' 

in Fig. 4 which shows the probability W for various actions. 
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Ideally, W1=W4=1 & W2=W3=0 
Fig. 4 SAFD performance matrix. 

Note that W2 being the probability of a false alarm should be zero, 
however, a small value ,say 1%, may be acceptable. On the other hand W3 
being the probability of a miss should indeed be zero, just as W4 should be 
1. Various alternatives have been considered for implementing such a 
SAFD. We shall consider the use of Pattern Recognition (PR) techniques for 
SAFD. It should also be pointed out that much of the data processing in PR, 
as described below, can also be used for the other vital activities 
envisaged for the future systems, namely, real-time control, health 
assessment amd condition monitoring (4,5). 
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Pattern Recognition (PR) 

The fundamental premise for applying PR techniques is the observation 
that when systems fail due to internal causes there are always some 
warning signs that preceed the event. Furthermore, the progression of a 
system from normal operating mode to anomalous (failure) mode does not 
happen at random but follows a pattern which can be analysed and 
explained. The object of PR technique, described here, is to identify the 
patterns that have led to failures and use this knowledge to look for 
warning signs in future tests and predict failures well in advance of their 

occurence. 

The current practice is based on red-lining the sensor outputs. The 
red-lining of n-sensors can be easily explained in terms of a polyhedron in 
n-dimensions as shown in Figs. 5(a,b,c). Each sensor is assigned a lower- 
and an upper-bound value for 'normal' operation and these define the two 
'red-lines' for that sensor. For a 3-sensor case the state of the system, at 
a given time, can uniquely be defined by a point in the rectangular 
prismatic region of the S1-S2-S3 Space (S-Space), Fig. 5c. The collection 
of these state-points at successive times would define a trajectory in the 
S-Space. All the possible normal runs of the system would then be given by 
trajectories that lie entirely within the 'red-lined' rectangular prism as 
shown in Fig. 6. In principle, any trajectory that tends to approach a 
boundary and exit to the outside region is an indication of an imminent 

failure. 

One can learn to detect the failure trends by examining the data of the 27 
tests that resulted in failure and compare it with the normal test data. It 
is quite possible that the failure trajectories will reveal their different 
character (as compared to normal trajectories) even before coming close 
to the red-line polyhedron boundary as shown in Fig. 7. 
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Fig. 5 Multi-Sensor 
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Fig. 6 System Trajectories in 
Sensor Space 



Failure Mode Normal Mode 
Fig. 7 Failure & Normal Mode Trajectories 
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There are two points that have to be considered in this context, namely, 

1 • straight forward fixed red-lines for a sensor are adequate only for 
very special cases where no coupling among the sensors exists, i. e. the 
red-line for a given sensor is independent of the values of all the other 
parameters as measured by the rest of the sensors. Let r k be the red-line 
for the kth sensor, then 

r k = c k , where c's are constants 

The software red-lines can be defined by replaing c's by functions f k so 
that, 


r k = M^l’^2’ • • * S n ), where S k is the kth sensor reading 

In real-time this implies that as the test is progressing the readings S k 

are used to calculate the various r k 's through f k 's. This can become not 

only computationally quite cumbersome but the explicit form of f k itself 

has to be known perhaps from a simulation model of the system. In 
principle, it is simple to build the simulation model in a modular manner 
(6), however, the ad hoc nature of such models leads to different control 
and real-time simulation models. By such models it is quite possible to 
determine most of the f k 's, however, some crucial gaps may exist in this 

knowledge since not all the failure mechanisms are well understood. 

2. Even if the f k 's are known and the soft red-lines can be determined, 

there is yet another serious problem. In principle all red-lines, soft or 
otherwise, are based on a single time frame of the system without 
considering how the system got to the state represented by the time 
frame. Questions of the type; has the system reached its present state 
through a transient, slow drift, excessive noise or under a close-loop 
command etc., are not considered by red-line methods. The method 
proposed here considers the entire system trajectory and compares it with 
other trajectories to detect failure prone trajectories. 

The PR technique we propose to employ here has two important steps, 

-- extension of the sensor-space into Feature Space 
-- Segmentation of the feature-space into normal- and failure-regions 
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Feature Space 


The sensor space discussed above has two major drawbacks, namely, 

- For a truly multi-sensor system such as SSME the total amount of data 
is too large (about 100 Mbits) and can become too unwieldy for real-time 

processing. 

On the other hand most of the data is of routine nature and a tremendous 
amount of data compression can be achieved by isolating and analysing 
only the deviations from the norm or the steady state. The norms can be 
defined as those values which can be calculated or predicted (assuming 
normal SSME operation) from a few key parameters e. g. power level, MCC 
pressure, throttle position etc. In the simplest case, only the deviations in 
sensor values, as compared to a moving average defined over a certain 
interval, are to be used for further analysis. This may even include 
deviations caused by closed- or open-loop control commands as may 
happen during throttling. 

- The sensor space, as based only on the sensor values, may not highlight 
the features important for SAFD. 

This is based on the fact that the raw sensor readings, along with their 
red-lines, may themselves be not good indicators of impending failure. 
Further processing is often required to calculate features which are 
directly related to the failure modes. In Fig. 8 we show some of the 
features that can be defined for a given sensor. Starting with the raw 
value one can calculate first an average over a certain interval and then 
the deviation from it. From these one can also calculate the signal to noise 
ratio S/N which could be another feature. To detect drifts one can also 
calculate the local gradients as another feature. Similarly Fourier 
Transform of the signal (or the deviation), over a given time window, can 
be another feature, as shown in Fig. 8. One can also define 'compound 
features involving data from more than one sensor. Thus, if needed, the 
net thermal flux, which may not be measured by a single sensor, can be 
calculated from the presure, flow velocity and temperature as measured 
by sensors in the MCC and it can be used as a feature for failure detection. 
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Based on the above discussion, the sensor space is replaced by a 
feature-time space, where a feature is defined as the deviation from the 
norm or steady-state as calculated from some key parameters or by 
averaging over a specified interval. The state of SSME, at any given time 
will thus be represented by a state point in the feature space. In Fig. 9 a 
normal SSME run is shown in a two-feature space. In a normal run, all the 
state points cluster around the time axis as shown, since no large 
deviations are encountered. 
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Fig. 8 Sensor Values & Features 
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Fig. 9 Feature Space Representation of 
Normal SSME Test 
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Segmentation 


Segmentation is the process of partitioning the feature space into 
clusters that can be identified with definite states of the system e. g. 
pre-failure or normal. An idealised illustration of this is shown in Fig. 10 
where the entire feature space has been projected along the time axis. All 
the points representing the normal runs should lie in a small region, 
cluster 1, around the origin. During normal runs there are large deviations 
caused by genuine excursions such as throttling etc. Such states of the 
system might show up as another region, cluster 2. It is anticipated that 
the deviations due to the failure modes will be of different nature and 
hopefully form another distinct region, cluster 3. An another form of the 
same situation is depicted in Fig. 11 where an entire run is represented by 
a trajectory. A steady run trajectory would then lie entirely within 
cluster 1 whereas a controlled excursion in a run might cause the 
trajectory to migrate to cluste 2, but eventually return to cluster 1 after 
the steady state has been reached. 

In practice, the situation may not be quite so clean cut, the clusters may 
not have so well defined boundaries and they may overlap. A number of 
powerful statistical techniques is available to locate cluster boundaries 
in such cases . It is also possible to assign to each state point, the 
probability of membership to a given cluster. One can also define a 
distance metric in the feature space to group points in clusters. In Table 2. 
(7) some of the commonly employed distance measures and the associated 
error bounds are shown. 
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Fig. 10 Segmentation of the Feature Space 



Fig. IITrajectories across the Clusters 
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Table 2. Distance Measure and Error Bounds 


Name 

Bayes error probability 

P. • 

I) Equivocation or Shannon 
entropy 

ff({X\X) ■■ 

2) Average conditional quad- 
ratic entropy [Vajda (1970)] 

KW) 

3) Bayesian distance [Devijver 
(1974)) 

WJUO 

4) Minkowski measures of non- 
uniformity [Toussajnt (1973)) 

M.(nir) 

5) Bhattacharyya bound [see 
Kailath (1967)] 

Wl-D 

6) Chemoff bound [see Kailath 
(1967)) 

Cl.n\X;s) 

7) Kolmogorov variational 
distance [see Kailath (1967)) 

K( nun 

S) Generalized Kolmogorov 
distance (Devijver (1974), 
1 and Fu (1973)] 

A(niA-) 

9) A family of approximating 
functions [Ito (1972)) 

C.(nir) 

10) The Matusita distance (see 
Kailath (1967)] 

r 


wir) - £UF,(>*, I*) • 


m Clou Boundt 

HI - m\x)i 

. m - I r. lmB(Q\X) - ll 

S(l -«)IS — [l - J m - t - 1 

S/,S[I - *«MOI 

- #n|JO - iU - - MUOI*); 

rn 

£ • • • £ £ * * • S ^ 

[see Cover and Han (1967) and Devijver (1974)] 

Two Class Bounds 

m m 

multicategory error: 3, £ £ £ A(**i.**j); 

i- i j*« ♦ t 

Hi - (A(0|jni ,#- ) * A * 1(1 - UCX]X £' t ll|t 

upper bound equals [t — 3(0!^)), when * » 2; 

&+ 1 S flii C« ■ 1 “ 3(01-0; 


- A(*vi|jr)] ,,i * lwu#l )J 


r - [f 5 {/>(*!*!) - PW"iM J **] 


7 gives the same bound as 6(0|-O; 
two-class bound relations: 

p, * QJWX) s OotOMO s i«(n|JO s WI*) 

[ict Ito (1972) and Heilman and Raviv (1970)) 


Notation: n - (w„ i - ■ ■ ■ m; 1 £ m < »V-a *' p*ro^ Way deM^funcnon^K I X ) ?s a p«l^or probalnUiy 

sss 'gou&vosr.ig: - * ” ,h — - /w; 

Rw, is an m class infinite sample nearest- neighbor risk; Re** is a * nearest-neighbor risk. 


the steps employed in the above technique are, 

- definition of the features and construction of the feature space 

- plotting of ground test data (of both normal and failure tests) as 

trajectories in the feature space 

- segmentation of trajectories into failure and normal runs. 
Implementation of these steps in practice is discussed nelow. 
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Results and Conclusions 

the data from a run is stored on a number of magnetic tapes . the data for 
a short interval (10-100 sec.) and from a few imortant ssrs is combined 
into a single tape file. This tape file is read into a disk file which can be 
accessed by application programs. Fig. 12 shows the header, or the 
Run-Log, of the disk data file. This data, as can be seen from the first line, 
is for the time period 320 to 392 secs, of the run #901-364 which 
resulted in a failure. It also shows the ssr RID#, the engineering unit used 
and the SSME component mnemonic. 


9010364R*11 320.00000 392.15000 

367 AP MCC H.G. INJ PR 

940 GP HPFP CLNT LN PR 

395 GP HCC OX INJ PR 

410 AP FPB PC NFD 

480 GP OPB PC 

459 AP HPFP DS PR NFD 

764 RH HPFP SPD NFD 

854 GP FAC OX FH DS PR 

858 GP ENG OX IN PR 1 

878 GP HX INT PR 

879 IC HX INT T 

883 DP HX VENT DP 


Fig. 12 Data File Header 
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An interactive, menu driven program has been written to process the data 
and extract the features. In Fig. 13 a beginning MENU of the program is 
shown. Various types of operations are available by choosing the 
appropriate code, these operations include both recursuve and 
non-recursive filters, data compression, Logical Operations, FFT, 
Look-Up-Tables etc. 
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* - TYPE SEQI’S OF COMPONENTS, END WITH 0 


i 

0 

TYPE # OF DATA POINTS TO READ (LT.3000). 0=EXIT 
3300 

« « « « CHOOSE OPTION BY TYPING # # # # 

DATA COHPRESS * < * * * * =1 
NON-REC FLTR * * * * * * =2 
RECURSIVE FILTER * * * * *3 
NRH-OPRNS ******** =4 
FFT *********** “5 
EXIT ****** =0 


Fig. 13 Operations MENU 
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The first step is to read the raw data for a ssr by choosing the appropriate 
PID#. In Fig. 14 the data for PID#367, for the entire duration of 320 to 392 
seconds is shown 


UCC H. 6. INJ PR R 3 6 4 PID367 
1-1 G 02-1 



320 330 340 350 360 370 380 390 


FILES: 1 =FOR001 . DAT = 486 


Fig. 14. MCC H. G. INJ PR, PID #367, RUN 901-364 

From the above figure it is clear that the data has some structure in the 
form of some distinct features, however, the noise level is fairly high to 
mask them, the first step we have taken is to reduce the 'observational 
sampling rate' through moving average. This is done in the following three 
steps; 


c - W 
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1 . Select a window size N - 0, 1 , 2 . . . . 

Let M = 2N + 1 

2. Form signal averages from the raw signal Sj 


where k = N+1, 2N+1, 3N+1, . . .2rN+1, . . .and 
i = k-N, k-N+1 , . . .k+N 

3. Replace the original signal Sj by 3_ k . the sampling rate in the new 
signal, S. k > is reduced by a factor of N. 

In Fig. 15 the sampling rate of the data in Fig. 14 has been reduced by N-9, 
or compressed by a factor of 9. the program allows an interactive choice 
of N. the data in Fig. 15 still seems to have some noise which can be 
removed by various filtering techniques. As an illustration Fig. 16 shows 
the result of applying a non-recursive to the data of Fig. 15. 

the data in Fig. 16 seems to have two distinct features, namely, a 
predominant frequency and a 'drifting background'. To separate these two 
components one can determine local averages over an interval larger than 
the hi-freq. wavelength as shown by the background line in Fig. 17. from 
this one can determine the zero-crossing points. A smooth curve can be 
fitted to these points to determine the background, as shown in Fig. 18. 

the background level, as found in Fig. 18 can now be subtracted from Fig. 16 
to give the hi-freq. component of the signal, as shown in Fig. 19. This 
signal can further be 'smoothed' to yield a 'cleaner' hi-freq. signal, as 

shown in Fig. 20. 
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Fig. 15 Reduced Sampling Rate Data, PID#367 


R 3 6 5 3 . 3K/ 1 0 



F I LES: 1 =FOR001 . DAT : 495 
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Fig. 16 High-Pass Filtered Data, PID#367 


R 3 6 4 3 . 3 K / 1 0 RECF-4 
1 - 1 G 03-1 

3.644 
3642 
3640 
3638 
3636 
3634 
3632 
3630 
3628 
3626 
3624 

320 330 340 350 360 370 380 390 

F I LES : 1 =FOR001 . DAT ; 495 



XIV-20 



Fig. 17 Zero Crossing Points, PID#367 
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Fig. 18 Background Trend, PID#367 


R 3 6 4 3.3K/10 R E C F - 4 ♦ R E C F - 1 0 



FILES: 1 =FOR001 . DAT : 495 



Fig. 19 High Frequency Data, PiD#367 


R 3 6 4 3.3K/10 RE C 4 - RE C 1 0 
1 - 1 Q 05-1 



FILES: 1 =FOR001 .001:495 
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20 Smoothed Hi-Freq Data, PID#367 


R 3 6 4 3.3K/10 R3CR4-C R4 + R10 )3 
1 - 1 G G 6 - 1 



FILES: 1 =FOR00 1 . DAT : 497 
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the original signal of Fig. 14 can now be said to have two distinct features 

as represented by Figs. 18 & 20. the former is a slow drift with with 

plateaus, whereas the latter is a high frequency jitter, which, if needed 
can be Fourier analysed. This drift and the high frequency can now be 

taken as features for representation in the feature space as discussed 
earlier. 

the above steps can now be repeated for the other sensors, the 

accumulation of all such ssrs and their features can be put in a single 
time-feature file to construct trajectories, the present computer 

facilities, with the limited memory allocation and the lack of on-line 
graphics, did not allow such implementation. A demonstration of this 

however, was realised off-line at UAH facility and presented as a video 
film at MSFC. 

From this study, the design of a comprehensive system to analyse the 
SSME ground test data has been made, the system should consist of; 

1 • A double density (6250/1600 bpi) tape drive interfaced to the host 
VAX/VMS environment. 

2. An on-lne RGB graphics display. 

3. At least 20M disk memory. 

4. A graphics kernel with hooks to application environment. 

5. A two tiered version of software for interactive development and macro 
oriented operation. 
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Conclusions 


- a feature space description of the SSME ground test data has been 
realised. 

- An interactive program has been written to extract features from the 
ground test data. 

- Techniques of pattern recognition have been identified to measure the 
deviations from the normal runs 

-- A design of a more comprehensive program has been made to; 

A. Survey a large number of normal runs (about 50), and 

B. Survey all the failed runs (27) and compare them with the above. 

- Considering that an overall comprehensive review of neither the normal 
nor the failed runs exists it is highly recommended that an analysis 
environment of the type discussed above, be implemented. 
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